I would like to do statistical analysis on an Excel file. Rather than learn a new programming language VBA (Visual Basic for Applications), I would rather work with Excel files in Python. This post shows how to read and write Excel files with Python.

1. OpenPyXL

The Python library openpyxl is designed for reading and writing Excel xlsx/xlsm/xltx/xltm files. Refer to openpyxl documentation for its usage.

Use the following command to install openpyxl:

$ sudo pip install openpyxl

BTW, xlrd and xlwt are for reading and writing spreadsheet files compatible with older Microsoft Excel files (i.e., .xls) respectively.

2. Read an Excel file as lists

2.1 Get a worksheet

load_workbook(...) is used to open an Excel file and return a workbook.

from openpyxl import load_workbook

load_workbook(filename, read_only=False, keep_vba=False, data_only=False, guess_types=False)

# for instance
wb = load_workbook(file_workbook, read_only=True)

In general, a workbook contains several worksheet. Get a worksheet with a given sheet name,

ws = wb[sheet_name]

2.2 Read to a list of lists

ws.iter_rows(...) is used to iterate over cells in the worksheet ws. For instance,

iter_rows(range_string=None, row_offset=0, column_offset=0)

# read the worksheet to a list of lists
lists = list()

for row in ws.iter_rows(): # if wanna skip the header, min_row=2
    # select fields
    selected_fields = [
                    row[IDX_CARD_ID],
                    row[IDX_NAME],
                    row[IDX_TIME],
                    row[IDX_IN_OUT]
                    ]

    # deal with blank fields; some fields might contain Chinese characters
    lists.append([item.value.encode('utf8') if item.value else None for item in selected_fields])

Get values with a given cell range,

cell_range = ws['B2':'B7']

3. Write to an Excel file

4. Pandas

4.1 Read excel

Use pandas.read_excel,

pandas.read_excel(io, sheet_name=0, *, header=0, names=None, index_col=None, usecols=None, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, parse_dates=False, date_parser=_NoDefault.no_default, date_format=None, thousands=None, decimal='.', comment=None, skipfooter=0, storage_options=None, dtype_backend=_NoDefault.no_default, engine_kwargs=None)

Read specific sheets, sheet_name is by default set to 0 meaning load the first sheet.

# Read Multiple sheets
dict_df = pd.read_excel(filename, sheet_name=['Technologies','Schedule'])

# Get DataFrame from Dict
technologies_df = dict_df .get('Technologies')

Header. By default (header=0), it considers the first row from excel as a header and used it as DataFrame column names. header=None is considered the first row from excel as a data record.

# header = 0
    序号    姓名
0    1   卓万顺
1    2   林宇涛

# heaer = None
     0     1
0   序号    姓名
1    1   卓万顺
2    2   林宇涛

# heaer = 1
     1   卓万顺
0    2   林宇涛

Read specific columns, usecols takes values int, str, list-like, or callable default None.

df = pd.read_excel(filename, usecols=['Courses', 'Duration'])
df = pd.read_excel(filename, usecols='B:D')
df = pd.read_excel(filename, usecols=[0,2])

Read specific rows,

df = pd.read_excel(filename, skiprows=2)
df = pd.read_excel(filename, skiprows=[1,3])

Set column as index, index_col takes values int, list of int, default None,

df = pd.read_excel(filename, index_col=0)

4.2 Write excel

Use pandas.DataFrame.to_excel,

DataFrame.to_excel(excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True, inf_rep='inf', freeze_panes=None, storage_options=None, engine_kwargs=None)

Some examples,

df.to_excel(filename, index=False) # Export DataFrame to Excel with No Index
本文系Spark & Shine原创,转载需注明出处本文最近一次修改时间 2023-11-08 01:40

results matching ""

    No results matching ""